Sampling & Resampling

Sampling

Three methods in applied machine learning

Pasted image 20240809092951.png|400

Simple Random Sampling

Systematic Sampling

Stratified Sampling

Cluster Sampling

Sampling error

Two main types of errors include selection bias and sampling error.


Resampling

The problem of sampling is that we only have a single estimate of the population parameter, with little idea of the variability or uncertainty in the estimate.

One way to address this is by estimating the population parameter multiple times from our data sample. This is called resampling.

Serveral resampling methods include permutation, Bootstrap, Jackknife and cross validation.

The best figure telling the story of resampling:

Pasted image 20240809092603.png|500

Permutation

Bootstrap

Jackknife

Cross validation

Comparisons between permutations and bootstrap:

The difference between permutation and bootstrap is that bootstraps sample with replacement, and permutations sample without replacement.

The permutation test is best for testing hypotheses and bootstrapping is best for estimating confidence intervals.